RBCN: Rectified Binary Convolutional Networks with Generative Adversarial Learning 65
where ◦is an operator that obtains the pruned weight with mask Mp. The other part of
the forward propagation in the pruned RBCNs is the same as in the RBCNs.
In pruned RBCNs, what needs to be learned and updated are full precision filters Wp,
learnable matrices Cp, and soft mask Mp. In each convolutional layer, these three sets of
parameters are jointly learned.
Update Mp. Mp is updated by FISTA [141] with the initialization of α(1) = 1. Then
we obtain the following.
α(k+1) = 1
2(1 +
1 + 4α2
(k)),
(3.84)
y(k+1) = Mp,(k) + a(k) −1
a(k+1)
(Mp,(k) −Mp,(k−1)),
(3.85)
Mp,(k+1) = proxη(k+1)λ||·||1 (y(k+1) −ηk+1
∂(LAdv p + LData p)
∂(y(k+1))
),
(3.86)
where ηk+1 is the learning rate in iteration k + 1 and proxη(k+1)λ||·||1 (zi) = sign(zi) · (|zi| −
η0λ)+, more details can be found in [142].
Update Wp. Let δW l
p,i be the gradient of the full precision filter W l
p,i. During backprop-
agation, the gradients pass to ˆW l
p,i first and then to W l
p,i. Furthermore,
δW l
p,i = ∂Lp
∂ˆW l
p,i
= ∂LS p
∂ˆW l
p,i
+ ∂LAdv p
∂ˆW l
p,i
+ ∂LKernel p
∂ˆW l
p,i
+ ∂LData p
∂ˆW l
p,i
,
(3.87)
and
W l
p,i ←W l
p,i −ηp,1δW l
p,i,
(3.88)
where ηp,1 is the learning rate, ∂LKernel p
∂ˆ
W l
p,i
and ∂LAdv p
∂ˆ
W l
p,i
are
∂LKernel p
∂ˆW l
p,i
= −λ1(W l
p,i −Cl
p ˆW l
p,i)Cl
p,
(3.89)
∂LAdv p
∂ˆW l
p,i
= −2(1 −D(T l
p,i; Yp)) ∂Dp
∂ˆW l
p,i
.
(3.90)
And
∂LData p
∂ˆW l
p,i
= −1
n(Rp −Tp) ∂Tp
∂ˆW l
p,i
,
(3.91)
Update Cp. We further update the learnable matrix Cl
p with W l
p and M l
p fixed. Let δClp
be the gradient of Cl
p. Then we have
δCl
p = ∂Lp
∂ˆClp
= ∂LS p
∂ˆClp
+ ∂LAdv p
∂ˆClp
+ ∂LKernel p
∂ˆClp
+ ∂LData p
∂ˆClp
,
(3.92)
and
Cl
p ←Cl
p −ηp,2δCl
p.
(3.93)
and ∂LKernel p
∂Cl
p
and ∂LAdv p
∂Cl
p
are
∂LKernel p
∂Clp
= −λ1
i
(W l
p,i −Cl
p ˆW l
p,i) ˆW l
p,i,
(3.94)